{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# College Admissions Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the geostates package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`geostates` can be used to create choropleth plots of the United States or individual states. It is easy to use\n",
    "so we will start out with an example to show you some of the ins and outs of the package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Admissions analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** To illustrate the power of the package, we will start out by creating a plot that shows how the number of Princeton University acceptances varies by state in the United States."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will start by importing the `pandas` and `geostates` packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading in the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, we use admissions data on the *Princeton University Class of 2025* from the [Princeton University Undergraduate Admissions Department](https://admission.princeton.edu/apply/admission-statistics). The CSV includes the total number of admits in the United States as of 30 August 2021 broken down by each geography (state)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in the data\n",
    "admissions_data = pd.read_csv('Desktop/admissions_data_22.csv', index_col='state')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>admits</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Washington</th>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nevada</th>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Montana</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            admits\n",
       "state             \n",
       "Washington      17\n",
       "Oregon           6\n",
       "California     140\n",
       "Nevada           4\n",
       "Montana          1"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# take a look at what the CSV file looks like\n",
    "admissions_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyzing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's take a look at which states have the most admits by sorting the list by descending values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>admits</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>New Jersey</th>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New York</th>\n",
       "      <td>133</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Massachusetts</th>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pennsylvania</th>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Connecticut</th>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Maryland</th>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Illinois</th>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               admits\n",
       "state                \n",
       "New Jersey        188\n",
       "California        140\n",
       "New York          133\n",
       "Massachusetts      76\n",
       "Pennsylvania       68\n",
       "Texas              52\n",
       "Florida            50\n",
       "Connecticut        42\n",
       "Maryland           41\n",
       "Illinois           32"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sort the values to see which states have the most admits\n",
    "sorted_admits_data = admissions_data.sort_values(by='admits', ascending=False)\n",
    "\n",
    "# view the first 10 values of the sorted pandas dataframe\n",
    "sorted_admits_data.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The table above shows that New Jersey, California, and New York have the most number of admits for the Princeton undergraduate class of 2025."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total admits from top three states: 461 students\n",
      "Total domestic admits: 1145 students\n",
      "40.26% of domestic admits come from NJ, CA, and NY\n"
     ]
    }
   ],
   "source": [
    "# see what percent of the total number of domestic admits come from these top three states\n",
    "\n",
    "# calculate the total number of admits from New Jersey, California, and New York\n",
    "top_three_total_admits = sorted_admits_data.head(3)['admits'].sum()\n",
    "print('Total admits from top three states:', top_three_total_admits, 'students')\n",
    "\n",
    "# calculate the total number of domestic admits\n",
    "total_domestic_admits = sorted_admits_data['admits'].sum()\n",
    "print('Total domestic admits:', total_domestic_admits, 'students')\n",
    "\n",
    "# calculate the percent of the total admits that these three states contribute\n",
    "percent = (top_three_total_admits/total_domestic_admits)\n",
    "print('{:.2%}'.format(percent), 'of domestic admits come from NJ, CA, and NY')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is interesting! It turns out just **three states** comprise over 40% of the domestic undergraduate admits to Princeton University."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the data using geostates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step for using the `geostates` package is to load in the geodataframe containing all of the state values. For this, we will use the `load_states()` function and assign it to a value `df`. Once we've loaded in the geodataframe we need to merge it with out cattle data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the load_states() function from the geostates package\n",
    "from geostates.shapefiles import load_states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>NAME</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STUSPS</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>MS</th>\n",
       "      <td>28</td>\n",
       "      <td>01779790</td>\n",
       "      <td>0400000US28</td>\n",
       "      <td>28</td>\n",
       "      <td>Mississippi</td>\n",
       "      <td>00</td>\n",
       "      <td>121533519481</td>\n",
       "      <td>3926919758</td>\n",
       "      <td>MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NC</th>\n",
       "      <td>37</td>\n",
       "      <td>01027616</td>\n",
       "      <td>0400000US37</td>\n",
       "      <td>37</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>00</td>\n",
       "      <td>125923656064</td>\n",
       "      <td>13466071395</td>\n",
       "      <td>MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OK</th>\n",
       "      <td>40</td>\n",
       "      <td>01102857</td>\n",
       "      <td>0400000US40</td>\n",
       "      <td>40</td>\n",
       "      <td>Oklahoma</td>\n",
       "      <td>00</td>\n",
       "      <td>177662925723</td>\n",
       "      <td>3374587997</td>\n",
       "      <td>POLYGON ((-103.00257 36.52659, -103.00219 36.6...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VA</th>\n",
       "      <td>51</td>\n",
       "      <td>01779803</td>\n",
       "      <td>0400000US51</td>\n",
       "      <td>51</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>102257717110</td>\n",
       "      <td>8528531774</td>\n",
       "      <td>MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WV</th>\n",
       "      <td>54</td>\n",
       "      <td>01779805</td>\n",
       "      <td>0400000US54</td>\n",
       "      <td>54</td>\n",
       "      <td>West Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>62266474513</td>\n",
       "      <td>489028543</td>\n",
       "      <td>POLYGON ((-82.64320 38.16909, -82.64300 38.169...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       STATEFP   STATENS     AFFGEOID GEOID            NAME LSAD  \\\n",
       "STUSPS                                                             \n",
       "MS          28  01779790  0400000US28    28     Mississippi   00   \n",
       "NC          37  01027616  0400000US37    37  North Carolina   00   \n",
       "OK          40  01102857  0400000US40    40        Oklahoma   00   \n",
       "VA          51  01779803  0400000US51    51        Virginia   00   \n",
       "WV          54  01779805  0400000US54    54   West Virginia   00   \n",
       "\n",
       "               ALAND       AWATER  \\\n",
       "STUSPS                              \n",
       "MS      121533519481   3926919758   \n",
       "NC      125923656064  13466071395   \n",
       "OK      177662925723   3374587997   \n",
       "VA      102257717110   8528531774   \n",
       "WV       62266474513    489028543   \n",
       "\n",
       "                                                 geometry  \n",
       "STUSPS                                                     \n",
       "MS      MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...  \n",
       "NC      MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...  \n",
       "OK      POLYGON ((-103.00257 36.52659, -103.00219 36.6...  \n",
       "VA      MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...  \n",
       "WV      POLYGON ((-82.64320 38.16909, -82.64300 38.169...  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load in the geodataframe and assign it to df\n",
    "df = load_states()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Merging the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to sucessfully create a choropleth map of the college admissions data, we need to merge it with the geodataframe that contains all the information for creating the plots of the states. We can do this by using the `pandas merge` function. Since the index for the college admissions data is `state` and our geodataframe contains a similar column (`NAME`) we can use this value to merge both dataframes. Let's start out by renaming the `NAME` column in our geodataframe to `state` so that the names of both columns match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>state</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STUSPS</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>MS</th>\n",
       "      <td>28</td>\n",
       "      <td>01779790</td>\n",
       "      <td>0400000US28</td>\n",
       "      <td>28</td>\n",
       "      <td>Mississippi</td>\n",
       "      <td>00</td>\n",
       "      <td>121533519481</td>\n",
       "      <td>3926919758</td>\n",
       "      <td>MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NC</th>\n",
       "      <td>37</td>\n",
       "      <td>01027616</td>\n",
       "      <td>0400000US37</td>\n",
       "      <td>37</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>00</td>\n",
       "      <td>125923656064</td>\n",
       "      <td>13466071395</td>\n",
       "      <td>MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OK</th>\n",
       "      <td>40</td>\n",
       "      <td>01102857</td>\n",
       "      <td>0400000US40</td>\n",
       "      <td>40</td>\n",
       "      <td>Oklahoma</td>\n",
       "      <td>00</td>\n",
       "      <td>177662925723</td>\n",
       "      <td>3374587997</td>\n",
       "      <td>POLYGON ((-103.00257 36.52659, -103.00219 36.6...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VA</th>\n",
       "      <td>51</td>\n",
       "      <td>01779803</td>\n",
       "      <td>0400000US51</td>\n",
       "      <td>51</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>102257717110</td>\n",
       "      <td>8528531774</td>\n",
       "      <td>MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WV</th>\n",
       "      <td>54</td>\n",
       "      <td>01779805</td>\n",
       "      <td>0400000US54</td>\n",
       "      <td>54</td>\n",
       "      <td>West Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>62266474513</td>\n",
       "      <td>489028543</td>\n",
       "      <td>POLYGON ((-82.64320 38.16909, -82.64300 38.169...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       STATEFP   STATENS     AFFGEOID GEOID           state LSAD  \\\n",
       "STUSPS                                                             \n",
       "MS          28  01779790  0400000US28    28     Mississippi   00   \n",
       "NC          37  01027616  0400000US37    37  North Carolina   00   \n",
       "OK          40  01102857  0400000US40    40        Oklahoma   00   \n",
       "VA          51  01779803  0400000US51    51        Virginia   00   \n",
       "WV          54  01779805  0400000US54    54   West Virginia   00   \n",
       "\n",
       "               ALAND       AWATER  \\\n",
       "STUSPS                              \n",
       "MS      121533519481   3926919758   \n",
       "NC      125923656064  13466071395   \n",
       "OK      177662925723   3374587997   \n",
       "VA      102257717110   8528531774   \n",
       "WV       62266474513    489028543   \n",
       "\n",
       "                                                 geometry  \n",
       "STUSPS                                                     \n",
       "MS      MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...  \n",
       "NC      MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...  \n",
       "OK      POLYGON ((-103.00257 36.52659, -103.00219 36.6...  \n",
       "VA      MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...  \n",
       "WV      POLYGON ((-82.64320 38.16909, -82.64300 38.169...  "
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rename the 'NAME' column in the geodataframe to 'State'\n",
    "geo_df = df.rename(columns={'NAME': 'state'})\n",
    "geo_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Important:** To make sure that we do not accidentally loose any important data during the merge, we need to make sure that we include the `how='outer'` parameter in the merge statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>admits</th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Washington</td>\n",
       "      <td>17</td>\n",
       "      <td>53</td>\n",
       "      <td>01779804</td>\n",
       "      <td>0400000US53</td>\n",
       "      <td>53</td>\n",
       "      <td>00</td>\n",
       "      <td>172112588220</td>\n",
       "      <td>12559278850</td>\n",
       "      <td>MULTIPOLYGON (((-122.57039 48.53785, -122.5686...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Oregon</td>\n",
       "      <td>6</td>\n",
       "      <td>41</td>\n",
       "      <td>01155107</td>\n",
       "      <td>0400000US41</td>\n",
       "      <td>41</td>\n",
       "      <td>00</td>\n",
       "      <td>248606993270</td>\n",
       "      <td>6192386935</td>\n",
       "      <td>MULTIPOLYGON (((-123.59892 46.25145, -123.5984...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>California</td>\n",
       "      <td>140</td>\n",
       "      <td>06</td>\n",
       "      <td>01779778</td>\n",
       "      <td>0400000US06</td>\n",
       "      <td>06</td>\n",
       "      <td>00</td>\n",
       "      <td>403503931312</td>\n",
       "      <td>20463871877</td>\n",
       "      <td>MULTIPOLYGON (((-118.60442 33.47855, -118.5987...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Nevada</td>\n",
       "      <td>4</td>\n",
       "      <td>32</td>\n",
       "      <td>01779793</td>\n",
       "      <td>0400000US32</td>\n",
       "      <td>32</td>\n",
       "      <td>00</td>\n",
       "      <td>284329506470</td>\n",
       "      <td>2047206072</td>\n",
       "      <td>POLYGON ((-120.00574 39.22866, -120.00559 39.2...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Montana</td>\n",
       "      <td>1</td>\n",
       "      <td>30</td>\n",
       "      <td>00767982</td>\n",
       "      <td>0400000US30</td>\n",
       "      <td>30</td>\n",
       "      <td>00</td>\n",
       "      <td>376962738765</td>\n",
       "      <td>3869208832</td>\n",
       "      <td>POLYGON ((-116.04914 48.50205, -116.04913 48.5...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        state  admits STATEFP   STATENS     AFFGEOID GEOID LSAD         ALAND  \\\n",
       "0  Washington      17      53  01779804  0400000US53    53   00  172112588220   \n",
       "1      Oregon       6      41  01155107  0400000US41    41   00  248606993270   \n",
       "2  California     140      06  01779778  0400000US06    06   00  403503931312   \n",
       "3      Nevada       4      32  01779793  0400000US32    32   00  284329506470   \n",
       "4     Montana       1      30  00767982  0400000US30    30   00  376962738765   \n",
       "\n",
       "        AWATER                                           geometry  \n",
       "0  12559278850  MULTIPOLYGON (((-122.57039 48.53785, -122.5686...  \n",
       "1   6192386935  MULTIPOLYGON (((-123.59892 46.25145, -123.5984...  \n",
       "2  20463871877  MULTIPOLYGON (((-118.60442 33.47855, -118.5987...  \n",
       "3   2047206072  POLYGON ((-120.00574 39.22866, -120.00559 39.2...  \n",
       "4   3869208832  POLYGON ((-116.04914 48.50205, -116.04913 48.5...  "
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.merge(admissions_data, geo_df, on='state', how='outer')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the plot_states() function from geostates\n",
    "from geostates.plot import plot_states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a choropleth map that displays the admits for each state in the United States\n",
    "# plot = plot_states(data_2, column='admits', cmap=new_cmap, labels='both', linestyle='none', legend='legend',\n",
    "                   #bins=15)\n",
    "\n",
    "# add a title to the plot\n",
    "# plot.annotate('Princeton Admissions Data 2022', xy=(-97, 50.5), fontsize=18, ha='center');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}